NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Real-Time Adaptive Motion Planning via Point Cloud-Guided, Energy-Based Diffusion and Potential Fields

https://doi.org/10.1109/LRA.2025.3592048

Teshome, Wondmgezahu; Behzad, Kian; Camps, Octavia; Everett, Michael; Siami, Milad; Sznaier, Mario (September 2025, IEEE Robotics and Automation Letters)

Free, publicly-accessible full text available September 1, 2026
3D-HGS: 3D Half-Gaussian Splatting

Li, Haolin; Liu, Jinyang; Sznaier, Mario; Camps, Octavia (June 2025, IEEE)

Free, publicly-accessible full text available June 15, 2026
InCoDe: Interpretable Compressed Descriptions for Image Generation

Comas, Armand; Chattopadhyay, Aditya; Formosa, Feliu; Liu, Changyu; Camps, Octavia; Vidal, Rene (May 2025, ICLR)

Free, publicly-accessible full text available May 1, 2026
Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers*

https://doi.org/10.1109/CVPR52733.2024.02171

Liu, Jinyang; Teshome, Wondmgezahu; Ghimire, Sandesh; Sznaier, Mario; Camps, Octavia (June 2024, IEEE)

Full Text Available
Identifying the dynamics of interacting objects with applications to scene understanding and video temporal manipulation

https://doi.org/10.1016/j.ifacol.2024.08.545

Comas, Armand; Fernandez, Christian; Ghimire, Sandesh; Li, Haolin; Camps, Octavia; Sznaier, Mario (January 2024, IFAC-PapersOnLine)

Full Text Available
Towards Seamless Egocentric Hand Action Recognition in Mixed Reality

https://doi.org/10.1109/ISMAR-Adjunct60411.2023.00088

Reza, Sakib; Zhang, Yuexi; Camps, Octavia; Moghaddam, Mohsen (October 2023, IEEE)

Full Text Available
Enhancing Transformer Backbone for Egocentric Video Action Segmentation

Reza, Sakib; Sundareshan, Balaji; Moghaddam, Mohsen; Camps, Octavia (May 2023, arxiv)

Egocentric temporal action segmentation in videos is a crucial task in computer vision with applications in various fields such as mixed reality, human behavior analysis, and robotics. Although recent research has utilized advanced visual-language frameworks, transformers remain the backbone of action segmentation models. Therefore, it is necessary to improve transformers to enhance the robustness of action segmentation models. In this work, we propose two novel ideas to enhance the state-of-the-art transformer for action segmentation. First, we introduce a dual dilated attention mechanism to adaptively capture hierarchical representations in both local-to-global and global-to-local contexts. Second, we incorporate cross-connections between the encoder and decoder blocks to prevent the loss of local context by the decoder. We also utilize state-of-the-art visual-language representation learning techniques to extract richer and more compact features for our transformer. Our proposed approach outperforms other state-of-the-art methods on the Georgia Tech Egocentric Activities (GTEA) and HOI4D Office Tools datasets, and we validate our introduced components with ablation studies. The source code and supplementary materials are publicly available on https://www.sail-nu.com/dxformer.
more » « less
Full Text Available
Generic Action Start Detection

https://doi.org/10.1109/MIPR54900.2022.00074

Zhang, Yuexi; Chen, Ming; Li, Yikang; Hsiao, Jenhao; Camps, Octavia; Ho, Chiuman (August 2022, IEEE 5th Int. Conf. on Multimedia Information Processing and Retrieval (MIRP))

Full Text Available
Key Frame Proposal Network for Efficient Pose Estimation in Videos

https://doi.org/10.1007/978-3-030-58520-4_36

Zhang, Yuexi; Wang, Yin; Camps, Octavia; Sznaier, Mario (November 2020, European Conference on Computer Vision)

Full Text Available
Dynamic Motion Representation for Human Action Recognition

https://doi.org/10.1109/WACV45572.2020.9093500

Asghari-Esfeden, Sadjad; Sznaier, Mario; Camps, Octavia (March 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV))

Despite the advances in Human Activity Recognition, the ability to exploit the dynamics of human body motion in videos has yet to be achieved. In numerous recent works, re- searchers have used appearance and motion as independent inputs to infer the action that is taking place in a specific video. In this paper, we highlight that while using a novel representation of human body motion, we can benefit from appearance and motion simultaneously. As a result, bet- ter performance of action recognition can be achieved. We start with a pose estimator to extract the location and heat- map of body joints in each frame. We use a dynamic encoder to generate a fixed size representation from these body joint heat-maps. Our experimental results show that training a convolutional neural network with the dynamic motion representation outperforms state-of-the-art action recognition models. By modeling distinguishable activities as distinct dynamical systems and with the help of two stream net- works, we obtain the best performance on HMDB, JHMDB, UCF-101, and AVA datasets.
more » « less
Full Text Available

« Prev Next »

Search for: All records